Building a SMILE to Python Translator

SMILE is the language description used in the XMILE format. Part of parsing XMILE files will be to parse strings of code in SMILE format. To do this we need to understand the SMILE grammar - and more importantly, python needs to know how to do it as well.

In this notebook we'll be using parsimonious to interpret our grammar, parse our strings, and return for us an Abstract Syntax Tree. There are a variety of other tools we could use:

  • PLY - 55397 downloads in the last month
  • plex - 949 downloads in the last month
  • tokenizertools - 642 downloads in the last month
  • pyparsing - 221150 downloads in the last month
  • ANTLR - not python native
  • others

Parsimonious seems to strike a good ballance between new, high-level-functionality, and maturity.

We will use Parsing Expression Grammar to specify how parsimonious should interpret our input strings. Here is a good slide deck of how PEG works, here is the original paper describing the concept, and here are some reasonable tutorials

PEG compares to a few other syntaxes for describing grammar:

Here are some examples of parsimonious in action

Parsimonious has its own spin on PEG (mostly replacing <- with =) and those changes are listed on the main github page.

We're just building a translator, but if we wanted to build a full-out interpreter, here is how we should do it:

So, in general, how does this work?

The parser looks at the first line, and tries to match it. If the first line fails, the whole thing fails.

Regular expressions are included with the syntax ~"expression"

Statements that include a / b / etc... give you the preferential choice for the string element to be of type a, and if not, then perhaps b, and so on.

As with regular expressions, a trailing +, ?, or * denotes the number of times the preceeding pattern should be matched.

For example, in this grammar:

grammar = """
Term     = Factor Additive*
Additive= ("+"/"-") Factor

Factor   = Primary Multiplicative*
Multiplicative = ("*" / "/") Primary

Primary  = Parens / Neg / Number 
Parens   = "(" Term ")"
Neg      = "-" Primary
Number   = ~"[0-9]+"
"""

if we try and parse "5+3", then the parser looks at the first line Term and says: 'If this is going to match, then the first component needs to be a Factor', so it then goes and looks at the definition for Factor and says: 'If this is going to match, then the first component needs to be a Primary'. Then it goes to look at the definition for Primary and says: 'This might be a Parens, lets check. Then it goes and looks at the definition of Parens and finds that the first element does not equal to 5, so it says 'nope!' and goes back up a level to Primary.

It then checks to see if the first component of the string fits the Neg pattern, and discovers that it doesn't, and returns to the Primary definition and checks the third option: Number. It goes to the definition of number and says 'Hey, at least the first character matches Number - but number asks for one or more characters between 0 and 9, so lets check the next character - it is a +, so that doesnt fit the pattern, so we'll capture 5 as a Number, then return up to Primary - and as there are no other commands listed in Primary also return to Factor.

Now, factor asks for zero or more Multiplicative components, so lets check if our string (now with the 5 removed) matches the Multiplicative pattern. The first element of a multiplicative component should be '*', or '\/', and it isnt, so lets pop back up to Factor, and then to Term.

The term then goes on to see if the string (starting at '+') matches the additive pattern - and it sees that the '+' matches its first condition, and then goes on to check for a Factor, beginning with the '5' in the string. This follows the same path as we saw before to match the 5 as a factor element.

The parser collects all of the components into a tree and returns it to the user.


In [1]:
import parsimonious

Start with someone else's arithmetic grammar

by Philippe Sigaud, available here

This is a good example of how to get around the left-hand recursion issue.

Our translator will have several parts, that we can see here.

  1. First, we define the grammar and compile it
  2. Then we define a function to parse the Abstract Syntax Tree and translate any of its components (here translation is just to return a stringified version)
  3. Then we parse the string we're interest in translating to an AST
  4. Finally, we crawl the AST, compiling an output string.

In [2]:
#define the grammar
grammar = """
Term     = Factor (Add / Sub)*
Add      = "+" Factor
Sub      = "-" Factor
Factor   = Primary (Mul / Div)*
Mul      = "*" Primary
Div      = "/" Primary
Primary  = Parens / Neg / Number 
Parens   = "(" Term ")"
Neg      = "-" Primary
Number   = ~"[0-9]+"
"""
g = parsimonious.Grammar(grammar)

def to_str(node):
    if node.children:
        return ''.join([to_str(child) for child in node])
    else:
        return node.text

AST = g.parse("2+3")    
eval(to_str(AST))


Out[2]:
5

Simplify

Now, we don't care about the difference between addition and subtraction, or between multiplication and division, as we're going to treat them both the same, so lets simplify the grammar to take care of this case


In [3]:
grammar = """
Term     = Factor Additive*
Additive= ("+"/"-") Factor

Factor   = Primary Multiplicative*
Multiplicative = ("*" / "/") Primary

Primary  = Parens / Neg / Number 
Parens   = "(" Term ")"
Neg      = "-" Primary
Number   = ~"[0-9]+"
"""
g = parsimonious.Grammar(grammar)

g.parse("2+3")

def to_str(node):
    if node.children:
        return ''.join([to_str(child) for child in node])
    else:
        return node.text

eval(to_str(g.parse("2+3*4")))


Out[3]:
14

Add floating point numbers

Now we'll go with a more complex number definition to try and capture floats


In [4]:
grammar = """
Term     = Factor Additive*
Additive= ("+"/"-") Factor
Factor   = Primary Multiplicative*
Multiplicative = ("*" / "/") Primary
Primary  = Parens / Neg / Number 
Parens   = "(" Term ")"
Neg      = "-" Primary
Number   = ((~"[0-9]"+ "."? ~"[0-9]"*) / ("." ~"[0-9]"+)) (("e"/"E") ("-"/"+") ~"[0-9]"+)?
"""
g = parsimonious.Grammar(grammar)

g.parse("2+3")

def to_str(node):
    if node.children:
        return ''.join([to_str(child) for child in node])
    else:
        return node.text

eval(to_str(g.parse("2.1+3*-4.2*.3")))


Out[4]:
-1.6800000000000002

Identifiers

If we want to include variables in the schema, we need to be able to handle identifiers. Lets practice with an empty grammar to get it right.


In [5]:
grammar = """
Identifier = !Keyword ~"[a-z]" ~"[a-z0-9_\$]"* 
Keyword = 'int'
"""
g = parsimonious.Grammar(grammar)

def to_str(node):
    if node.children:
        return ''.join([to_str(child) for child in node])
    else:
        return node.text

hi=4    
eval(to_str(g.parse("hi")))


Out[5]:
4

Now lets add the identifiers to the arithmetic we were working on previously


In [6]:
grammar = """
Term     = Factor Additive*
Additive= ("+"/"-") Factor
Factor   = Primary Multiplicative*
Multiplicative = ("*" / "/") Primary
Primary  = Parens / Neg / Number / Identifier
Parens   = "(" Term ")"
Neg      = "-" Primary
Number   = ((~"[0-9]"+ "."? ~"[0-9]"*) / ("." ~"[0-9]"+)) (("e"/"E") ("-"/"+") ~"[0-9]"+)?
Identifier = !Keyword ~"[a-z]" ~"[a-z0-9_\$]"* 
Keyword = 'int' / 'exp'
"""
g = parsimonious.Grammar(grammar)

def to_str(node):
    if node.children:
        return ''.join([to_str(child) for child in node])
    else:
        return node.text

    
hi=4    
eval(to_str(g.parse("(5+hi)*3.1+.5")))


Out[6]:
28.400000000000002

Add function calls

Function calls are a primary unit in the order of operations. We explicitly spell out the keywords that are allowed to be used as function calls. If anything else comes in, it will throw an error. For starters, lets just use a few functions that we know python can handle, so we don't have to worry about translation.


In [7]:
grammar = """
Term     = Factor Additive*
Additive= ("+"/"-") Factor
Factor   = Primary Multiplicative*
Multiplicative = ("*" / "/") Primary
Primary  = Call / Parens / Neg / Number / Identifier
Parens   = "(" Term ")"
Call     = Keyword "(" Term ("," Term)* ")"
Neg      = "-" Primary
Number   = ((~"[0-9]"+ "."? ~"[0-9]"*) / ("." ~"[0-9]"+)) (("e"/"E") ("-"/"+") ~"[0-9]"+)?
Identifier = !Keyword ~"[a-z]" ~"[a-z0-9_\$]"* 
Keyword = 'exp' / 'sin' / 'cos'
"""
g = parsimonious.Grammar(grammar)

def to_str(node):
    if node.children:
        return ''.join([to_str(child) for child in node])
    else:
        return node.text
 
hi=4    
eval(to_str(g.parse("cos(5+hi)*3.1+.5")))


Out[7]:
-2.3245038118424985

Add exponentiation

Exponentiation adds another layer to our order of operations, and happens at the smallest unit, just above that of the primary elements. We put them in increasing order of priority, or from largest equation unit to smallest.

As the python syntax for exponentiation is ** instead of the SMILE standard ^, we have to make our first translation. We do this by making a special case in the translation function which knows specifically what to do with an exponentive node when it sees one.


In [8]:
grammar = """
Term     = Factor Additive*
Additive= ("+"/"-") Factor

Factor   = ExpBase Multiplicative*
Multiplicative = ("*" / "/") ExpBase

ExpBase  = Primary Exponentive*
Exponentive = "^" Primary

Primary  = Call / Parens / Neg / Number / Identifier
Parens   = "(" Term ")"
Call     = Keyword "(" Term ("," Term)* ")"
Neg      = "-" Primary
Number   = ((~"[0-9]"+ "."? ~"[0-9]"*) / ("." ~"[0-9]"+)) (("e"/"E") ("-"/"+") ~"[0-9]"+)?
Identifier = !Keyword ~"[a-z]" ~"[a-z0-9_\$]"* 

Keyword = 'exp' / 'sin' / 'cos'
"""
g = parsimonious.Grammar(grammar)

def translate(node):
    if node.expr_name == 'Exponentive': #special case for translating exponent syntax
        return '**' + ''.join([translate(child) for child in node.children[1:]])
    else:
        if node.children:
            return ''.join([translate(child) for child in node])
        else:
            return node.text
 
hi=4    
eval(translate(g.parse("cos(sin(5+hi))*3.1^2+.5")))
#eval(translate(g.parse("3+cos(5+hi)*3.1^2+.5")))
#translate(g.parse("cos(5+hi)*3.1^2+.5"))


Out[8]:
9.3053961907966638

Add translation of keywords

As the names of functions in XMILE does not always match the names of functions in python, we'll add a dictionary to translate them, and a special case in the translation function that handles keyword nodes.


In [9]:
#try translating keywords
#its important to get the keywords in the right order, so that 'exp' and 'exprnd' don't get confused.

grammar = """
Term     = Factor Additive*
Additive= ("+"/"-") Factor

Factor   = ExpBase Multiplicative*
Multiplicative = ("*" / "/") ExpBase

ExpBase  = Primary Exponentive*
Exponentive = "^" Primary

Primary  = Call / Parens / Neg / Number / Identifier
Parens   = "(" Term ")"
Call     = Keyword "(" Term ("," Term)* ")"
Neg      = "-" Primary
Number   = ((~"[0-9]"+ "."? ~"[0-9]"*) / ("." ~"[0-9]"+)) (("e"/"E") ("-"/"+") ~"[0-9]"+)?
Identifier = !Keyword ~"[a-z]" ~"[a-z0-9_\$]"* 

Keyword = 'exprnd' / 'exp' / 'sin' / 'cos'
"""
g = parsimonious.Grammar(grammar)

dictionary = {'exp':'exp', 'sin':'sin', 'cos':'cos', 'exprnd':'exponential'}

def translate(node):
    if node.expr_name == 'Exponentive': 
        return '**' + ''.join([translate(child) for child in node.children[1:]])
    elif node.expr_name == "Keyword": # special case for translating keywords
        return dictionary[node.text]
    else:
        if node.children:
            return ''.join([translate(child) for child in node])
        else:
            return node.text
 
hi=4    
eval(translate(g.parse("exprnd(5+hi)*3.1^2+.5")))
#translate(g.parse("cos(5+hi)*3.1^2+.5"))


Out[9]:
61.12136331904043

Add XMILE keywords

Now that this structure is in place, lets add a bunch more keywords


In [10]:
# expand the keywords to include a goodly XMILE subset
grammar = """
Term     = Factor Additive*
Additive= ("+"/"-") Factor

Factor   = ExpBase Multiplicative*
Multiplicative = ("*" / "/") ExpBase

ExpBase  = Primary Exponentive*
Exponentive = "^" Primary

Primary  = Call / Parens / Neg / Number / Identifier
Parens   = "(" Term ")"
Call     = Keyword "(" Term ("," Term)* ")"
Neg      = "-" Primary
Number   = ((~"[0-9]"+ "."? ~"[0-9]"*) / ("." ~"[0-9]"+)) (("e"/"E") ("-"/"+") ~"[0-9]"+)?
Identifier = !Keyword ~"[a-z]" ~"[a-z0-9_\$]"* 

Keyword = "exprnd" / "exp" / "sin" / "cos" / "abs" / "int" / "inf" / "log10" / "pi" /
          "sqrt" / "tan" / "lognormal" / "normal" / "poisson" / "ln" / "min" / "max" /
          "random" / "arccos" / "arcsin" / "arctan" / "if_then_else"
"""

g = parsimonious.Grammar(grammar)

dictionary = {"abs":"abs", "int":"int", "exp":"np.exp", "inf":"np.inf", "log10":"np.log10",
              "pi":"np.pi", "sin":"np.sin", "cos":"np.cos", "sqrt":"np.sqrt", "tan":"np.tan",
              "lognormal":"np.random.lognormal", "normal":"np.random.normal", 
              "poisson":"np.random.poisson", "ln":"np.ln", "exprnd":"np.random.exponential",
              "random":"np.random.rand", "min":"min", "max":"max", "arccos":"np.arccos",
              "arcsin":"np.arcsin", "arctan":"np.arctan", "if_then_else":"if_then_else"}

#provide a few functions
def if_then_else(condition, val_if_true, val_if_false):
    if condition:
        return val_if_true
    else:
        return val_if_false

def translate(node):
    if node.expr_name == 'Exponentive': # special case syntax change...
        return '**' + ''.join([translate(child) for child in node.children[1:]])
    elif node.expr_name == "Keyword":
        return dictionary[node.text]
    else:
        if node.children:
            return ''.join([translate(child) for child in node])
        else:
            return node.text
        
hi = 4        
eval(translate(g.parse("cos(min(hi,6)+5)*3.1^2+.5")))


Out[10]:
-8.2559618167117463

Conditional behavior

One of the xmile functions expects a boolean parameter, and so we had better add the ability to deal with conditional statements. These are even broader than addition and subtraction, and happen last in the order of operations - so naturally, the are first in our grammar.


In [11]:
grammar = """
Condition = Term Conditional*
Conditional = ("<=" / "<" / ">=" / ">" / "=") Term

Term     = Factor Additive*
Additive = ("+"/"-") Factor

Factor   = ExpBase Multiplicative*
Multiplicative = ("*" / "/") ExpBase

ExpBase  = Primary Exponentive*
Exponentive = "^" Primary

Primary  = Call / Parens / Neg / Number / Identifier 
Parens   = "(" Condition ")"
Call     = Keyword "(" Condition ("," Condition)* ")"
Neg      = "-" Primary
Number   = ((~"[0-9]"+ "."? ~"[0-9]"*) / ("." ~"[0-9]"+)) (("e"/"E") ("-"/"+") ~"[0-9]"+)?
Identifier = !Keyword ~"[a-z]" ~"[a-z0-9_\$]"* 

Keyword = "exprnd" / "exp" / "sin" / "cos" / "abs" / "int" / "inf" / "log10" / "pi" /
          "sqrt" / "tan" / "lognormal" / "normal" / "poisson" / "ln" / "min" / "max" /
          "random" / "arccos" / "arcsin" / "arctan" / "if_then_else"
"""
g = parsimonious.Grammar(grammar)


dictionary = {"abs":"abs", "int":"int", "exp":"np.exp", "inf":"np.inf", "log10":"np.log10",
              "pi":"np.pi", "sin":"np.sin", "cos":"np.cos", "sqrt":"np.sqrt", "tan":"np.tan",
              "lognormal":"np.random.lognormal", "normal":"np.random.normal", 
              "poisson":"np.random.poisson", "ln":"np.ln", "exprnd":"np.random.exponential",
              "random":"np.random.rand", "min":"min", "max":"max", "arccos":"np.arccos",
              "arcsin":"np.arcsin", "arctan":"np.arctan", "if_then_else":"if_then_else"}

#provide a few functions
def if_then_else(condition, val_if_true, val_if_false):
    if condition:
        return val_if_true
    else:
        return val_if_false

def translate(node):
    if node.expr_name == 'Exponentive': # special case syntax change...
        return '**' + ''.join([translate(child) for child in node.children[1:]])
    elif node.expr_name == "Keyword":
        return dictionary[node.text]
    else:
        if node.children:
            return ''.join([translate(child) for child in node])
        else:
            return node.text
 
hi=4    
eval(translate(g.parse("exprnd(if_then_else(5>6,4,3)+hi)*3.1^2+.5")))
#eval(translate(g.parse("if_then_else(5>6,4,3)")))
#eval(translate(g.parse("int(5<=6)")))
#eval(translate(g.parse("5<=6")))
#eval(translate(g.parse("if_then_else(6,4,3)")))
#eval(translate(g.parse("cos(min(5,hi,7)+5)*3.1^2+.5")))
#eval(translate(g.parse("cos(sin(5)+hi)*3.1^2+.5")))
#translate(g.parse("cos(5+hi)*3.1^2+.5"))
translate(g.parse("absolutely_nothing"))


---------------------------------------------------------------------------
ParseError                                Traceback (most recent call last)
<ipython-input-11-5998cd469d36> in <module>()
     60 #eval(translate(g.parse("cos(sin(5)+hi)*3.1^2+.5")))
     61 #translate(g.parse("cos(5+hi)*3.1^2+.5"))
---> 62 translate(g.parse("absolutely_nothing"))

/Library/Python/2.7/site-packages/parsimonious/grammar.pyc in parse(self, text, pos)
     81     def parse(self, text, pos=0):
     82         """Parse some text with the default rule."""
---> 83         return self.default_rule.parse(text, pos=pos)
     84 
     85     def match(self, text, pos=0):

/Library/Python/2.7/site-packages/parsimonious/expressions.pyc in parse(self, text, pos)
     38 
     39         """
---> 40         node = self.match(text, pos=pos)
     41         if node.end < len(text):
     42             raise IncompleteParseError(text, node.end, self)

/Library/Python/2.7/site-packages/parsimonious/expressions.pyc in match(self, text, pos)
     55         node = self._match(text, pos, {}, error)
     56         if node is None:
---> 57             raise error
     58         return node
     59 

ParseError: Rule 'Condition' didn't match at 'absolutely_nothing' (line 1, column 1).

Deal with identifiers that start with keywords

If we give the previous method a test case like "absolutely_nothing" - something intended to be an identifier - it tries to parse it with the keyword, and then gets stuck

One way to deal with this is to say that it is either something that is not a keyword, or its a keyword followed by at least one other character.

Identifier = (!Keyword ~"[a-z]" ~"[a-z0-9_\$]"*) / (Keyword ~"[a-z0-9_\$]"+)

This is also problematic, as the tree builds up with a keyword in it, and that keyword gets replaced.

Better to just make it a simple terminator:

Identifier =  ~"[a-z]" ~"[a-z0-9_\$]"*

and count on the fact that we give precendence to keywords in the primary statement:

Primary  = Call / Parens / Neg / Number / Identifier 

In [ ]:
grammar = """
Condition = Term Conditional*
Conditional = ("<=" / "<" / ">=" / ">" / "=") Term

Term     = Factor Additive*
Additive = ("+"/"-") Factor

Factor   = ExpBase Multiplicative*
Multiplicative = ("*" / "/") ExpBase

ExpBase  = Primary Exponentive*
Exponentive = "^" Primary

Primary  = Call / Parens / Neg / Number / Identifier 
Parens   = "(" Condition ")"
Call     = Keyword "(" Condition ("," Condition)* ")"
Neg      = "-" Primary
Number   = ((~"[0-9]"+ "."? ~"[0-9]"*) / ("." ~"[0-9]"+)) (("e"/"E") ("-"/"+") ~"[0-9]"+)?
Identifier =  ~"[a-z]" ~"[a-z0-9_\$]"*

Keyword = "exprnd" / "exp" / "sin" / "cos" / "abs" / "int" / "inf" / "log10" / "pi" /
          "sqrt" / "tan" / "lognormal" / "normal" / "poisson" / "ln" / "min" / "max" /
          "random" / "arccos" / "arcsin" / "arctan" / "if_then_else"
"""
g = parsimonious.Grammar(grammar)


dictionary = {"abs":"abs", "int":"int", "exp":"np.exp", "inf":"np.inf", "log10":"np.log10",
              "pi":"np.pi", "sin":"np.sin", "cos":"np.cos", "sqrt":"np.sqrt", "tan":"np.tan",
              "lognormal":"np.random.lognormal", "normal":"np.random.normal", 
              "poisson":"np.random.poisson", "ln":"np.ln", "exprnd":"np.random.exponential",
              "random":"np.random.rand", "min":"min", "max":"max", "arccos":"np.arccos",
              "arcsin":"np.arcsin", "arctan":"np.arctan", "if_then_else":"if_then_else"}

#provide a few functions
def if_then_else(condition, val_if_true, val_if_false):
    if condition:
        return val_if_true
    else:
        return val_if_false

def translate(node):
    if node.expr_name == 'Exponentive': # special case syntax change...
        return '**' + ''.join([translate(child) for child in node.children[1:]])
    elif node.expr_name == "Keyword":
        return dictionary[node.text]
    else:
        if node.children:
            return ''.join([translate(child) for child in node])
        else:
            return node.text
 

translate(g.parse("absolutely_nothing"))
translate(g.parse("normal_delivery_delay_recognized"))

return a list of dependancies

List is a list of the identifiers present in the equation.


In [ ]:
grammar = """
Condition = Term Conditional*
Conditional = ("<=" / "<" / ">=" / ">" / "=") Term

Term     = Factor Additive*
Additive = ("+"/"-") Factor

Factor   = ExpBase Multiplicative*
Multiplicative = ("*" / "/") ExpBase

ExpBase  = Primary Exponentive*
Exponentive = "^" Primary

Primary  = Call / Parens / Neg / Number / Identifier 
Parens   = "(" Condition ")"
Call     = Keyword "(" Condition ("," Condition)* ")"
Neg      = "-" Primary
Number   = ((~"[0-9]"+ "."? ~"[0-9]"*) / ("." ~"[0-9]"+)) (("e"/"E") ("-"/"+") ~"[0-9]"+)?
Identifier =  ~"[a-z]" ~"[a-z0-9_\$]"*

Keyword = "exprnd" / "exp" / "sin" / "cos" / "abs" / "int" / "inf" / "log10" / "pi" /
          "sqrt" / "tan" / "lognormal" / "normal" / "poisson" / "ln" / "min" / "max" /
          "random" / "arccos" / "arcsin" / "arctan" / "if_then_else"
"""
g = parsimonious.Grammar(grammar)


dictionary = {"abs":"abs", "int":"int", "exp":"np.exp", "inf":"np.inf", "log10":"np.log10",
              "pi":"np.pi", "sin":"np.sin", "cos":"np.cos", "sqrt":"np.sqrt", "tan":"np.tan",
              "lognormal":"np.random.lognormal", "normal":"np.random.normal", 
              "poisson":"np.random.poisson", "ln":"np.ln", "exprnd":"np.random.exponential",
              "random":"np.random.rand", "min":"min", "max":"max", "arccos":"np.arccos",
              "arcsin":"np.arcsin", "arctan":"np.arctan", "if_then_else":"if_then_else"}

#provide a few functions
def if_then_else(condition, val_if_true, val_if_false):
    if condition:
        return val_if_true
    else:
        return val_if_false

def get_identifiers(node):
    identifiers = []
    for child in node:
        for item in get_identifiers(child): #merge all into one list
            identifiers.append(item)
    if node.expr_name == 'Identifier':
        identifiers.append(node.text)
    return identifiers
    
def translate(node):
    if node.expr_name == 'Exponentive': # special case syntax change...
        return '**' + ''.join([translate(child) for child in node.children[1:]])
    elif node.expr_name == "Keyword":
        return dictionary[node.text]
    else:
        if node.children:
            return ''.join([translate(child) for child in node])
        else:
            return node.text
 

 a = get_identifiers(g.parse("Robert*Mary+Cora+(Edith*Sybil)^Tom+int(Matthew)*Violet".lower()))

In [ ]:
grammar = """
Condition = Term Conditional*
Conditional = ("<=" / "<" / ">=" / ">" / "=") Term

Term     = Factor Additive*
Additive = ("+"/"-") Factor

Factor   = ExpBase Multiplicative*
Multiplicative = ("*" / "/") ExpBase

ExpBase  = Primary Exponentive*
Exponentive = "^" Primary

Primary  = Call / Parens / Neg / Number / Identifier 
Parens   = "(" Condition ")"
Call     = Keyword "(" Condition ("," Condition)* ")"
Neg      = "-" Primary
Number   = ((~"[0-9]"+ "."? ~"[0-9]"*) / ("." ~"[0-9]"+)) (("e"/"E") ("-"/"+") ~"[0-9]"+)?
Identifier =  ~"[a-z]" ~"[a-z0-9_\$]"*

Keyword = "exprnd" / "exp" / "sin" / "cos" / "abs" / "int" / "inf" / "log10" / "pi" /
          "sqrt" / "tan" / "lognormal" / "normal" / "poisson" / "ln" / "min" / "max" /
          "random" / "arccos" / "arcsin" / "arctan" / "if_then_else"
"""
g = parsimonious.Grammar(grammar)


dictionary = {"abs":"abs", "int":"int", "exp":"np.exp", "inf":"np.inf", "log10":"np.log10",
              "pi":"np.pi", "sin":"np.sin", "cos":"np.cos", "sqrt":"np.sqrt", "tan":"np.tan",
              "lognormal":"np.random.lognormal", "normal":"np.random.normal", 
              "poisson":"np.random.poisson", "ln":"np.ln", "exprnd":"np.random.exponential",
              "random":"np.random.rand", "min":"min", "max":"max", "arccos":"np.arccos",
              "arcsin":"np.arcsin", "arctan":"np.arctan", "if_then_else":"if_then_else",
              "=":"==", "<=":"<=", "<":"<", ">=":">=", ">":">", "^":"**"}

#provide a few functions
def if_then_else(condition, val_if_true, val_if_false):
    if condition:
        return val_if_true
    else:
        return val_if_false

def get_identifiers(node):
    identifiers = []
    for child in node:
        for item in get_identifiers(child): #merge all into one list
            identifiers.append(item)
    if node.expr_name == 'Identifier':
        identifiers.append(node.text)
    return identifiers
    
def translate(node):
    if node.expr_name in ['Exponentive', 'Conditional']: #non-terminal lookup
        return dictionary[node.children[0].text] + ''.join([translate(child) for child in node.children[1:]])
    elif node.expr_name == "Keyword": #terminal lookup
        return dictionary[node.text]
    else:
        if node.children:
            return ''.join([translate(child) for child in node])
        else:
            return node.text
 

translate(g.parse("2+3=4+5"))
#print g.parse("2+3=4+5")

In [ ]:
grammar = """
Condition = Term Conditional*
Conditional = ("<=" / "<" / ">=" / ">" / "=") Term

Term     = Factor Additive*
Additive = ("+"/"-") Factor

Factor   = ExpBase Multiplicative*
Multiplicative = ("*" / "/") ExpBase

ExpBase  = Primary Exponentive*
Exponentive = "^" Primary

Primary  = Call / Parens / Neg / Number / Identifier 
Parens   = "(" Condition ")"
Call     = Keyword "(" Condition ("," Condition)* ")"
Neg      = "-" Primary
Number   = ((~"[0-9]"+ "."? ~"[0-9]"*) / ("." ~"[0-9]"+)) (("e"/"E") ("-"/"+") ~"[0-9]"+)?
Identifier =  ~"[a-z]" ~"[a-z0-9_\$]"*

Keyword = "exprnd" / "exp" / "sin" / "cos" / "abs" / "int" / "inf" / "log10" / "pi" /
          "sqrt" / "tan" / "lognormal" / "normal" / "poisson" / "ln" / "min" / "max" /
          "random" / "arccos" / "arcsin" / "arctan" / "if_then_else"
"""
g = parsimonious.Grammar(grammar)


dictionary = {"abs":"abs", "int":"int", "exp":"np.exp", "inf":"np.inf", "log10":"np.log10",
              "pi":"np.pi", "sin":"np.sin", "cos":"np.cos", "sqrt":"np.sqrt", "tan":"np.tan",
              "lognormal":"np.random.lognormal", "normal":"np.random.normal", 
              "poisson":"np.random.poisson", "ln":"np.ln", "exprnd":"np.random.exponential",
              "random":"np.random.rand", "min":"min", "max":"max", "arccos":"np.arccos",
              "arcsin":"np.arcsin", "arctan":"np.arctan", "if_then_else":"if_then_else",
              "=":"==", "<=":"<=", "<":"<", ">=":">=", ">":">", "^":"**"}

#provide a few functions
def if_then_else(condition, val_if_true, val_if_false):
    if condition:
        return val_if_true
    else:
        return val_if_false

def get_identifiers(node):
#     identifiers = []
#     for child in node:
#         for item in get_identifiers(child): #merge all into one list
#             identifiers.append(item)
#     if node.expr_name == 'Identifier':
#         identifiers.append(node.text)
#     return identifiers
    identifiers = []
    for child in node:
        identifiers += get_identifiers(child)
    identifiers += [node.text] if node.expr_name in ['Identifier'] else []
    return identifiers
    
def translate(node):
    if node.expr_name in ['Exponentive', 'Conditional']: #non-terminal lookup
        return dictionary[node.children[0].text] + ''.join([translate(child) for child in node.children[1:]])
    elif node.expr_name == "Keyword": #terminal lookup
        return dictionary[node.text]
    else:
        if node.children:
            return ''.join([translate(child) for child in node])
        else:
            return node.text
 
a = get_identifiers(g.parse("Robert*Mary+Cora+(Edith*Sybil)^Tom+int(Matthew)*Violet".lower()))
print a
#translate(g.parse("2+3=4+5"))
#print g.parse("2+3=4+5")